-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Revert "Revert "[Cluster Events] Add basic job events. (#29164)" (#29… #29196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…9164)" (ray-project#29195)" This reverts commit 75a0c49.
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
|
cc @architkulkarni I also added the idempotent |
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
|
test_dashboard is unrelated. It is flaky in the master, and the flaky test has been fixed from other PR |
|
@rkooo567 can you add some context to the PR description (what test was broken, how does this fix it) |
architkulkarni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
monitor_jobs changes look good!
|
Ideally we would also have a unit test for |
|
Let me make a follow up PRs for unit tests. I will add a bunch of them including verifying events. |
|
the dashboard failure should be unrelated and fixed in the mastesr |
…9164)" (ray-project#29… (ray-project#29196) This reverts the PR and fixes the test failures This also fixes a bug around _monitor_jobs API. the monitor job can be called on the same job twice now, which will break the event (because at the end of monitor job, we record the event that job is completed). The same completed event can be reported twice without the fix upgrade gtest to 1.12.1 (ray-project#29902) While working on ray-project#28209 I hit issue google/googletest#3514, which causes a couple gtest-dependent tests to be unable to build Signed-off-by: Pablo E <pabloem@apache.org>
…9164)" (ray-project#29… (ray-project#29196) This reverts the PR and fixes the test failures This also fixes a bug around _monitor_jobs API. the monitor job can be called on the same job twice now, which will break the event (because at the end of monitor job, we record the event that job is completed). The same completed event can be reported twice without the fix Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Lint failed after it is merged due to some conflict. This should fix the issue Signed-off-by: SangBin Cho <rkooo567@gmail.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
…195)"
This reverts commit 75a0c49.
Why are these changes needed?
This reverts the PR and fixes the test failures
This also fixes a bug around
_monitor_jobsAPI. the monitor job can be called on the same job twice now, which will break the event (because at the end of monitor job, we record the event that job is completed). The same completed event can be reported twice without the fixRelated issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.